Targeted Gene Metagenomic Data Analysis ◾ 277
You can notice that the overall quality scores of the reads are high but there are also
some reads with quality score less than 20 (99% accuracy) toward the end of the reads.
We can trim the low-quality bases from the end of the reads. The demultiplexed sequence
length summary at the bottom of the Interactive Quality Plot tab shows that the reads have
equal length (275 base). This table will help us to determine if we need to make the length
of reads equal or not.
If you decide to filter out the reads with poor quality scores, you can use the “qual-
ity-filter” plugin with “q-score” method. However, this can be done for single-end reads.
For paired-end reads, you can join forward and reverse reads and then run “quality-fil-
ter q-score” on the merged reads. This will be discussed later with clustering. However,
denoising methods also have their way to filter low-quality reads as we will see soon. But,
if your data is single-end reads, you can use “quality-filter q-score” to remove low-quality
reads using the following script:
qiime quality-filter q-score \
--i-demux demux.qza \
--p-min-quality 20 \
--p-quality-window 5 \
--p-min-length-fraction 0.8 \
--p-max-ambiguous 0 \
--o-filtered-sequences demux-filtered.qza \
--o-filter-stats demux-filter-stats.qza
The default settings are “--p-min-quality 4”, “--p-quality-window 3”, “--p-min-length-frac-
tion 0.75”, and “--p-max-ambiguous 0”, if those parameters are not included in the above.
For more information about these parameters, use “qiime quality-filter q-score --help”. We
will discuss this in more detail with clustering.
If there are PCR primer sequences or any other non-biological sequences, you can
remove them using “cutadapt” plugin. Once more, this is not applicable to our yoga data
but, just in case, if you had sequences with primers at this stage of the analysis, it would
FIGURE 7.8 Per base quality plots of the yoga data.